-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grid Analyzer! #81
Grid Analyzer! #81
Conversation
…ing_ analyser is failing
…e pulled from files
… with wrongly formated values in grid-analyzer
This pull request introduces 3 alerts when merging 1ae12d3 into d8c20f1 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
…e, changed trainer command line argument handling - fixes #62)
… fix/grid-analyzer
It looks good! I haven't add the time to test it so will do so on Monday. This PR adds a dependency over |
Sadly not, please review the solutions that I have analyzed when fixing issue #52 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, thanks!
I have done some doc polishing and set a few basic methods of GridAnalyzer
as static ones.
I have tested the pipeline on Ubuntu CPUs (fixes #82 ) and GPUs (fixes #83 ), and it worked flawlessly.
Nonetheless, I have some remarks about the resulting csv
file:
-
The
train_status
value does not always match the actual one reported in thetrainer.log
. Critically, some runs are indicated as "Not converged" when they actually did. I am thinking that this happens as we do not always save the model. Thus, it is possible that we save the model mid-training, and the training status is then different than the final one -
I find the key name
train_start
to be somewhat confusing when compared tomodel_timestamp
(I thought at the beginning that we were not reporting the correct timestamp). Could we clarify this? -
We should report the aggregated statistics for the test, in place of the data point for the last episode. I think this is more valuable to the user, and shouldn't be hard to implement (since we have only 1 data point for the test, and the corresponding csv file exists in all cases)
Otherwise, this is an important PR, and this is great work! 👍
…ged in online trainer 2) added saving model after last epoch in offline trainer 3) importing aggregated test statistics in grid-analyzer
…ts at the end of training (when epoch/episode limit is hit), modifies model save. Fixes #85
Finally, working mip-grid-analyzer is here!
This required to solve several issues and introduce many changes in the whole experimental pipeline, starting from:
In order to test the whole pipeline, I have also prepared a config for grid_trainer with LeNet5 and SimpleCNN model trained on MNIST (fixes #51 )
The use case is as follows:
run grid-trainer on the configuration file (2 models, 1 problem, 5 repetitions)
'mip-grid-trainer-cpu --c configs/vision/grid_trainer_mnist.yaml'
run grid-tested on the newly created directory with containing 10 different models
'mip-grid-tester-cpu --e --expdir ./experiments_20181109_131153'
run grid-analyzer to aggregate all the results in a single csv file
'mip-grid-analyzer --e --expdir ./experiments_20181109_131153'
Attaching the resulting file:
20181109_134802_grid_analysis.xlsx
(What I have described above will in fact become a part of "reproducible research: MNIST Image Classification with LeNet-5 - issue #12 )
Besides, I've made plenty of minor changes, like: